How does music relate to the lyrics? It is tempting to think that a song tries to convey some feeling or emotion, and that both the music and lyrics are there to support this message. Let me give you an example. We might expect a song with a slow beat and laid back guitar to talk about laid back topics, maybe a trip to the beach. At the other end of the spectrum, heavy metal would likely concern itself with darker, heavier subjects. However, are these suspicions even true? Let’s put some numbers to the hypothesis that there in fact is a relationship between music and lyrics. In the next sections I’ll take you through a journey where we approach this topic with a statistical mindset, harnessing all the powers that modern technology has to offer along the way.
We’ll start out with picking a large body of music and for each track
in there, we are going to collect and store the lyrics. It would be way
too cumbersome to scrape all the lyrics from the internet myself, but
fortunately the Musixmatch API
allows querying lyrics from code in a single API call. For an unpaid
account only 30% of the lyrics for a queried track is returned, but that
will do for our intents and purposes. I assign every set of lyrics a
sentimental, or valency, score automatically using the NLTK package, which offers
natural language processing functionalities. A low score indicates a
sad feeling, whereas a high score a happy feeling.
When the lyrics have a numerical score we can start to answer our
question: how does music relate to lyrics?
The research question is still a bit broad. We settled on how to
analyze lyrics, but not yet on which aspects on music we’ll focus. We
are going to keep the research broad, and explore how the lyrics relate
to the four main elements of music, that is melody,
harmony, instrumentation and rhythm. To
access and preprocess the music properties the Spotify
API is used. For each element we will either confirm or refute any
hypotheses that intuitively make a lot of sense, but are not (yet)
backed up by data.
Let us dive into it!
The first order of business is choosing the corpus of music. We have
chosen a broad research question and the corpus should reflect this. It
must draw inspiration from various genres and contain a large number of
songs, only then can we justify general conclusions. Because the heavy
lifting in terms of fetching the data we need is done by the
Musixmatch and Spotify APIs, this is most
certainly possible.
The exhaustive list of albums that are included in this research:
Elephant, Madvillainy, ..Like Clockwork, Street Worms, Midnights, HEROES & VILLAINS, St. Elsewhere, The White Album, Plastic Beach, Demon Days, Thriller, In the Aeroplane Over the Sea, Hawaii: Part ||, WHEN WE ALL FALL ASLEEP, WHERE DO WE GO?, Dua Lipa, The Money Store, OFFLINE!, OK Computer and Rumours
This totals over 280 tracks and 16 hours of listening time.
Playlist
The Spotify API offers functionalities that range from
very high to very low level. Here we will use some the the high level
analyses like valence and energy to learn about the corpus.
When we plot energy, musical and lyrical valence values against each other we find something enormously interesting. Clearly, even though energy and valence do not seem related, musical and lyrical valence appear highly correlated.
Intuitively, it would appear that melody encodes a lot of the valency information of a song. The melody is usually the most memorable part and often indicative of the feel of a song. So it makes sense to look at the melody of two tracks, one with low and one with high lyrical valence. A visualization tool that makes sense to use, is a chromogram. This captures for each moment the notes that are played, as analyzed using the fourier transform. Let’s try this and see if any melody lines become apparent.
Unfortunately, looking at the chromograms, no discernible melody is recognizable. The only thing that sticks out is the droning ‘E’ in Ball and Biscuit, but this could hardly be called a melody. It appears we need a different tool.
Though finding specific melodies is a difficult task to automize, we could look at the key in which the melody is played.
The troubles with key matching.
Hypothesis: higher tempo songs tend to be more aggressive and slower songs more sensual.
Let’s put this one to the test. For this hypothesis we’ll denote songs that have a lower BPM than the median (< 115.9BPM) as slow songs, and the rest songs as fast songs (≥ 115.9BPM).
So far we’ve explored only the lyrical valency property, but not the
lyrics themselves. We might gain some new insights if we look at the
lyrics directly, so let’s try it. One of the most useful tools for
visualizing patterns in textual data is a so called
word cloud, which you can see to the side. The words in
blue refer to words that occur very
frequently in fast songs relative to slow songs, and vice versa for the
red words.
Immediately we can see instances that prove the hypothesis. Slow song words include sensual words such as number (as in, someones phone number), kiss, boy and hot. These are words we would expect to encounter in a love song. Though what stands out is that love is included in the fast songs. There are also some odd ones out like bones. As for the fast tracks we also find what one would expect, e.g. aggressive words like kill, gun and ill. Also, very noticeably, we find numerous verbs and filler words. This makes sense in a track where the singer (or rapper) has to keep up the pace in a high BPM track, and it’s easiest for the listener and artist to reuse many of the common verbs and filler words to keep the information stream somewhat limited.
Most of the data in this plot seems to confirm the hypothesis (though there are exceptions, like love among the fast tracks).